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Abstract —We consider the problem of maximizing the alpha- 
fairness utility over the downlink of a heterogeneous wireless 
network (HetNet) by jointly optimizing the association of users 
to transmission points (TPs) and the activation fractions of all 
TPs. Activation fraction of each TP is the fraction of the frame 
duration for which it is active, and together these fractions 
influence the interference seen in the network. To address this 
joint optimization problem we adopt an approach wherein the 
activation fractions and the user associations are optimized in 
an alternating manner. The sub-problem of determining the 
optimal activation fractions is solved using an auxiliary function 
method that we show is provably convergent and is amenable to 
distributed implementation. On the other hand, the sub-problem 
of determining the user association is solved via a simple com¬ 
binatorial algorithm. Meaningful performance guarantees are 
derived and a distributed variant offering identical guarantees 
is also proposed. The significant benefits of using the proposed 
algorithms are then demonstrated via realistic simulations. 

I. Introduction 

It is well established by now that future cellular networks 
will be dense HetNets formed by a multitude of disparate 
transmission points deployed in a highly irregular fashion 
[I]. In a majority of these deployments, the transmission 
points (TPs) will be connected to each other by a non-ideal 
backhaul with a relatively high latency (several dozens of 
milliseconds). An unfortunate consequence of such a high 
latency is that it renders unsuitable resource management 
(RM) schemes that strive to coordinate and obtain allocation 
decisions within a fine time-scale (e.g., 1 ms in LTE HetNets) 
M- Instead, semi-static resource management schemes 
where RM is performed at two time scales, are better suited 
since they are more robust towards backhaul latency. Broadly 
speaking, in any such semi-static scheme the RM that is done 
at a coarse frame level granularity (that is at-least as large as 
the backhaul latency) entails coordination among TPs based on 
averaged (not instantaneous) slowly varying metrics. On the 
other hand, the RM in such a scheme that is done at a much 
finer slot level granularity involves no coordination among TPs 
and is done independently by each active TP based on fast 
changing metrics ©-G3- The semi-static scheme that we 
propose in this paper decides at the onset of each frame which 
set of users should each TP serve over that frame such that 
each user is served by exactly one TP (user association) and 
how often should each TP transmit over that frame (activation 
fraction of that TP). 

The problem at hand is quite challenging due to the well 
recognized interference coupling problem. Indeed, while in¬ 
creasing the activation fraction (AF) of a TP will help it 


serve more users (or serve a given set of users better), it 
injects more interference to all users being served by other 
TPs. User Association (without AF optimization) is by itself a 
popular HetNet RM scheme, wherein the interference coupling 
problem is simplified by assuming that the interference that 
would be seen by any user upon being associated to any TP 
remains static. Association is then determined by optimizing 
a system utility G3- fTT) , or by minimizing a cost function 
given traffic demands [91, or by adopting a game theoretic 
framework COD- Joint optimization of user association along 
with other system resources, such as power and bandwidth 
in the downlink @, pg-pa, [19j] and user powers and TP 
locations in the uplink~pO), 1211, has also received significant 
attention. Considering the downlink which is our focus in this 
paper, we see that the alternating optimization framework is a 
popular approach to ensure tractability, and that binary (on-off) 
power control has been found to be particularly effective in 
terms of being robust and capturing most of the available gains 
with a small signalling footprint. The latter observation has led 
to another promising downlink semi-static RM technique that 
is fully compliant with the LTE standard, and seeks to capture 
the benefits of slot-level coordinated binary power control over 
a HetNet with a non ideal backhaul. This scheme combines 
user association with partial muting of the high power Macro 
TP, i.e., the Macro TP is allowed to be active (or transmit with 
a pre-determined power) for any fraction of the total number 
of slots in a frame. The choice of this AF for the macro TP 
is optimized together with the user association |22|, |23) . The 
macro TP then adopts a muting pattern (which includes its 
on-off status on all the slots) conforming to the determined 
AF. Notice that the exact on-off status of the macro TP on all 
the slots is not optimized. Indeed, doing so can be detrimental 
since coordination done at a coarse time-scale based on the 
available averaged metrics cannot adapt to the fast changing 
channel and interference conditions seen across the slots. 

Recent studies have shown that topologies without one 
common dominant interferer will be ubiquitous and in such 
cases optimizing the AF of only one TP is not enough. The 
problem we seek to solve is geared exactly towards such 
deployments. One attempt to solve our problem would be to 
extend the solutions proposed for the aforementioned scheme, 
but then it becomes immediately clear that those solutions 
do not scale when activation fractions for all TPs have to be 
optimized. This is because those solutions explicitly maintain 
a rate for each TP-user link under each possible interference 
pattern, which grow exponentially in the number of TPs. In 










this paper, we propose a simple formulation that imposes 
activation fractions and yields one average rate expression for 
each TP-user link. The latter expression is conservative and is a 
closed-form function of all activation fractions. Interestingly, 
in the absence of fast fading our rate expression reduces to 
the approximate rate expression introduced in |24) (see also 
©), which considered the problem of determining activation 
fractions to meet a given set of user traffic demands for a 
given user association. We confirm the observation made in 
those works that the rate expression is in-fact quite accurate 
over practical HetNets. Our main contributions are as follows: 


We adopt a—fairness utility as the system wide utility 
which generalizes all popular utility functions )26), wherein 
we also allow for assigning any arbitrary set of weights 
(reflecting priorities) to the users. We develop centralized 
and distributed algorithms that yield good solutions for any 
given fairness parameter a. These algorithms are obtained 
by adopting an alternating optimization based approach. The 
latter approach is well justified since the problem at hand is 
intractable and our goal is to obtain unified low-complexity 
algorithms that are suitable for all a. 

For the discrete user-association sub-problem, we first prove 
that this sub-problem itself is NP-hard and proceed to com¬ 
pletely characterize the underlying set function that needs to 
be optimized. We then suggest and comprehensively analyze 
a simple centralized combinatorial algorithm (referred to as 
the GLS algorithm) that involves a Greedy stage followed by 
Local Search improvements. Our analysis yields meaningful 
and novel readily computable performance guarantees for all 
a. Previous related works have considered the proportional 
fairness (PF) utility and proposed combinatorial user asso¬ 
ciation algorithms 03 , 03 - Our results when specialized 
to the case of the weighted PF utility (by setting a = 1) 
reveal that GLS is optimal up-to a constant additive factor 
of —21n(2). Thus, a simple algorithm yields optimality up- 
to an additive constant factor, a fact that was hitherto only 
established for a significantly more complex algorithm [15| 
(whose run-time can depend on the input weights). Upon 
further specializing to the case with identical user weights, 
we see that the guarantee proved for a greedy algorithm 
in 03 has an instance dependent (non constant) additive 
factor. Interestingly, our simulation results indicate that in 
this special case the association yielded by GLS is identical 
to the optimal one obtained via another more complex 
algorithm from G3- 

We derive a distributed version of the GLS algorithm and 
prove that remarkably it provides guarantees identical to 
its centralized counterpart. This distributed version requires 
network assistance in the form of periodic broadcast of 
system load information similar to that proposed earlier 
in 1271. The main novelty of our approach is that we are 
able to configure each user to consider the system utility 
gain in contrast to the selfish gain used in the user-centric 
approach adopted by |18j, |Z7]| and more recently in [ j 1 7) . 
Consequently, we can establish guarantees (with respect to 


the optimal system utility) and provable convergence for 
our distributed algorithms for all a. We note here that con¬ 
vergence of the user-centric approach to a Nash equilibrium 
was proved in (18) for particular choices of a and the recent 
and independent work in 03 has identified conditions under 
which the Nash equilibrium is (near-)optimal. 

• For the continuous AF optimization sub-problem we adopt 
the auxiliary function method and show that it is provably 
convergent. Such a method has been used for precoder 
optimization originally over the single-cell downlink in 
1 28) and over the multi-cell downlink in (3 followed by 
i*F 03- We note however that unlike those works we 
incorporate fading coefficients that change at two different 
time scales. Further, a key step in our case entails a novel 
GP formulation, which we show can also be implemented 
in a distributed manner. 

• Finally, the performance of all our algorithms is compared 
to appropriate baselines via extensive simulations over a 
HetNet topology generated as per 3GPP LTE guidelines. Our 
results highlight the significant gains that can be achieved 
in realistic HetNet deployments via the joint optimization. 

II. Problem Statement 

Considering the downlink in a HetNet, let U = {1, ■ ■ • , A'} 
denote the set of users and let B denote the set of transmission 
points (TPs) with cardinality \B\ = B. Further, suppose that 
the time axis is divided into multiple frames, where each 
frame consists of several consecutive slots. The fast fading 
coefficients for each user are assumed to change across slots in 
an independent identically distributed (i.i.d.) manner, while the 
slow fading coefficients are assumed to change across frames 
in an i.i.d. manner. The choice of the activation fraction for 
each TP along with the user association for all TPs is made 
once for each frame to optimize the system utility. This choice 
can be based on the slow fading realization in that frame but 
does not consider any previous such choices. Each TP then 
independently implements its per-slot scheduling policy over 
the users associated with it in that frame, where the latter 
scheduling policy respects the assigned activation fraction and 
can exploit the instantaneous fast fading coefficients seen 
by the associated users on each slot. Consequently, we can 
suppress the dependence on the frame and slot indices in the 
following. 

In order to formulate an optimization problem for de¬ 
termining the user association and activation fractions, we 
derive an average rate that each user can obtain over a frame 
of interest, under any given user association and activation 
fractions. Towards this end, let U lh ' 1 . Mb G B denote any 
given set of users associated to TP b over the frame and let 
P = [Pb]b&3 denote the activation vector, where pi, G [0,1] 
denotes the activation fraction assigned to TP b. We proceed 
by assuming that each TP b allocates a fraction 7fc.fc € [0,1] 
of the frame to serve each associated user k G U^ b \ such that 
YhkeuW 7 k,b = 1, where these fractions are determined at the 
onset of the frame. In particular, each TP is assumed to adopt 
an optimal fractional round robin per-slot scheduling policy. 





Note that an efficient per-slot scheduling policy (cf. (29)) 
that can adapt to the instantaneous fading and interference 
conditions seen across all the slots, will be at-least as good (in 
terms of optimizing the given utility). Next, we assume that the 
activation fraction of each TP b is implemented via a Bernoulli 
random variable X h with f?[Ty = p b , that is i.i.d. across slots 
in the frame and is independent of all other random variables. 
Specifically, TP b is assumed to transmit (with a fixed power) 
when X b = 1 and remain silent otherwise. Then, an average 
rate that can be achieved for user k £ is given by. 


7 k,bPb^ 


log 1 + ; 


f3k,b 


1 + J2b'^bPk,b'X b ' 


( 1 ) 


where the the desired channel gain /3k,b and the interfering 
channel gains {/3 k ,b'} are random variables that include both 
fast and slow fading as well as noise normalized transmit 
powers, and the expectation is over the activation variables 
as well as the fast fading. Upon invoking the fact that the 
instantaneous rate in O is convex in the activation variables, 
which we recall are independent of the fast fading coefficients, 
we can further lower bound G to obtain 


Tk = 7 k,b PbE 


log 1 + 


/5fc,b 


1 + Pk,b'Pb' 


( 2 ) 


=Rk,b(p) 


where now the expectation is over only the fast fading. 
Note that r k in Q depends on the slow fading realization 
(comprising of the path losses and shadowing factors) over 
the frame of interest. Letting r = [n, • ■ • , r k] denote the 
vector of such conservative rates obtained for all the K users 
over the frame, the achieved system utility is given by 

y w fc u(r fc ,a), (3) 

keu 

where a > 0 is a tunable fairness parameter and 

! „(!-<*) 

^ «e(o,i) 

log (rife) a = 1 (4) 


and Wk > 0 denotes the weight of user k £hi. These weights 
can be used to assign different priorities to different users and 
we assume that they are normalized, i.e., Akeu Wk = 1- We 
can now write our problem, which is a mixed optimization 
problem, as 


max < 

pe[0,l] B ;x k {,€{ 0 , 1 }; | 

y y Xk,b {w k u{'y k ,bRk,b{p))) > 

7 fc t b 6 [ 0 ,li"v k,b 

keubeB J 

S.t. ^ ^ %k,b — 

1, V k £ hi; y 7 k,b = 1 V b £ B. 

beB 

keu 


Note that in (|5j» the binary variable x k ,b is one if user k 
is associated to TP b and zero otherwise, so that the first 
set of constraints ensures that each user is associated with 


only one TP. Consequently, = {k : Xk, b = 1 }k&A 

yields the user set associated with TP b. Note that in 0- 
we enforce {U kb ’} b eB to be a partition of U. This is mean¬ 
ingful and indeed important since we are targeting short-term 
optimality by maximizing a system utility independently over 
each frame. The joint optimization problem in © is unfor¬ 
tunately intractable. Consequently, we develop an alternating 
optimization framework to solve the joint problem in ©■ 
We will demonstrate that although the user association and 
activation fractions are optimized assuming conservative rates 
and optimal fractional round robin per-slot scheduling policies 
at all TPs, the obtained solution retains its significant gains 
even without these assumptions. To improve readability the 
proofs of all the following propositions are deferred to the 
appendix. 

III. User Association 


We adopt the convention that 01n(0) = 0 and consider 
any fixed activation vector p with strictly positive elements 
(otherwise any TP b with p b = 0 can be simply removed). We 
proceed to systematically consider the user-association sub¬ 
problem of 0 given by 


max { Y.Y.x k , b (wfcn(7fc,i,flfc,i,(p))) } 

Tfc , b e[o.i] v k,b [ k&ub&B ) 

s.t. y x k,b = 1, v k £ U\ y 7 k,b = 1 v b £ B, 

beB keu 


( 6 ) 


over three regimes defined by the values of a. We first define 
a ground set, f2 = {(k,b) : k £ U,b £ B}, that consists 
of all possible tuples and where each tuple (k,b) denotes an 
association of user k to TP b. Then, we also define the set 
fi® = \(k, b) : k £ hi} for each TP b £ B which consists of 
all tuples whose TP is b, along with the set Q(fc) = {(k,b) : 
b £ B} for each user k which consists of all tuples whose user 
is k. Finally, we define a family of sets X, as the one which 
includes each subset of O such that the tuples in that subset 
have mutually distinct users. Formally, 


gCO: |gnO (fc) | < lVi«£eT. (7) 


We start with the regime a > 1 and note that for any given 
user association, i.e, for any given feasible choice of variables 
{ x k,b{ i G is a continuous optimization problem. Moreover, it 
is separable across the set of TPs and for each TP b £ B, we 
have a convex optimization problem over the set of variables 
{7fc,6} f° r k £ U : Xk,b = 1. Using K.K.T. conditions it is 
verified in the appendix that for each TP b £ B 


t 7 fc,i,e[ 0 ,i] V k < ^ Xk,b {WkU{lk,bRk,b{p))) Z = 
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(■ Rk, b (p )) 1 ~ a 
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Consequently, upon defining 


Q k\ a ) = ( w k 


(R kAp)) 1 

a — 1 


—a \ 1/ 


V a > 1, 
















(J 6 ]i reduces to the following discrete optimization problem. 


mm x fcjb e{o,i} v k,b 


X! X! x k,bQ { k *(«) 
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Considering the case a £ (0,1), 0 reduces to 

max ^(,£{0,1} v k ,b \ 'y ' I y ' Xk,b®k (c^) 

^2b£l3 x k b~^~ ^ l 


(9) 


( 10 ) 


where 0 


0 fc fc) ( a ) = ( w fe 


,6GB \k£U 

(tt'.fc.b(p)) 1- ° 
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\ !/“ 
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Recalling the sets £2. ihk)- 0® defined before, we further 
define the set function g : 2— —► JR as 


g(s,a) = X)( ^2 0 ( k b , } (a)) , 


bee (k',b')eg nnw 


( 11 ) 


V Q_ C £2, g ^ cj> with g(<f.>, a) = 0, where fi denotes the empty 
set. The minimization problem in 0 is now re-formulated as 

ming.g 6 x& \ g \= K { g ( S _, a)}, (12) 

whereas the maximization problem in © can be re¬ 
formulated as 


max £:aex& \ Q \ = K { g { G _, «)}■ (13) 

Similarly, for a = 1, 0 can be reformulated as in © but 
where g(<p, 1) = 0 and for all Q_ C O : Q_ ^ 


g(£,i) 


= ^ w fc ln(w fe f?. fc:f) (p))- 
(k,b)eg 


XI ( X! «b=') in 

&gb (fc',b')eann( b ) 


v(14) 
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We offer the following result. 


Table I: GLS Algorithm 

1: Initialize with a, A > 0, Maxlter > 1, G_= <j> and U' = U. 

2: Repeat 

3: Determine (k', b') as the tuple in Q which offers the best change 
among all tuples ( k, b) £ £2 such that g U (fc, b) £ X_. 

4: Update g = g U (A/, 6') and U' =U' \ {A/} 

5: Until U r = f. 

6: Set g = g, Iter = 0. 

7: Repeat 

8: Increment Iter = Iter + 1. 

9: Find a pair of tuples: (k',bi) £ g and (A;', 62) £ £2 \ g such 
that the relative improvement upon swapping (k',bi) £ g with 
(A/,62) is better than A. 

10: If such a pair exists then 

11: Update g = g U (A;', 62) \ (A;', &i). 

12: End If 

13: Until no such pair exists or Iter = Maxlter. 

14: Output g. 

as 

arg max (fejfe)e£ j : g u(M)62 { 5 (g U (fc, b),a)~ g(0, a)},a< 1, 
arg min (fc] 6 )e n.g u(fcj 6 )e x{5(£ U (k, b),a)~ g(Q, a)},a> 1 

The second stage of GLS is local search improvement and 
comprises of steps 7 to 13. Here, a feasible pair of tuples is 
determined in each local search iteration as (A/, 61 ), (k', 62 ) = 

! argmax taiti.b'ee {g(G_ U (k, b 1 ) \ (k, b), a)}, a < 1, 

(fc,b)e§,(fe,b')^£ W (25^) 

argmin k€U & b y eB {g(Q_U (k,b')\(k,b),a)},a > 1 

(fc,b)eg,(fc,b')£C 

and the corresponding relative improvement is deemed to be 
better than A by checking if 

g((G U (k r , b 2 ) \ (k 1 , 61 )), a) - g(G, a) > 

Asgn (g(&, a))g(g_, a), a < 1, (16) 


Proposition 1. For any a > 0, the user association sub¬ 
problem in 0 is NP-hard. Further, for any a > 1, the 
set function g(.,a) is a normalized, non-negative and non¬ 
decreasing supermoduiar set function. For any a £ (0,1), the 
set function g(.,a) is a normalized, non-negative and non¬ 
decreasing submodular set function. The set function gf, 1) is 
a normalized submodular set function. 

Note that the set function g(., 1) need not be non-negative 
nor non-decreasing. 


A. GLS: A Unified Algorithm 

In Table [I] we propose the GLS Algorithm, which is a 
simple combinatorial algorithm to solve the problem in ( 6 }. 
It considers the respective re-formulated versions in ( | 1 2\ or 
© and comprises of two stages. The first one is the greedy 
stage (steps 1 to 6 ). Here in each greedy iteration the feasible 
tuple (A/, b') (with respect to the ones already selected so far) 
offering the best change in system utility is selected, until no 
such tuple can be found. In particular, (kf, //) is determined 


g({G_ U (k\ b 2 ) \ (k 61 )), a) - g(Q, a) < 

- Ag(g,a), a > 1, (17) 

where sgn(x) = 1, Vx > 0 and —1 otherwise. 

We now proceed to analyze the performance of GLS. We 
seek to bound the gap (by obtaining easily computable bounds) 
between the optimal system utility and the one returned by 
GLS. Towards this end, let g opt denote the optimal solution 


to the problem in ( 121 for a > 1 or (13 1 for a £ ( 0 , 1 ], and let 
g. Q denote the counterparts obtained by our algorithm as the 
final output and after the greedy stage, respectively. We will 
first analyze the performance of the greedy first stage. The 
challenge here is that the underlying set function need not 
be submodular (when a > 1 ) or it need not be non-negative 
and non-decreasing (when a = 1 ), which precludes us from 
directly applying the analysis in (30), [311. To overcome this 


limitation, we first derive new bounds that relate the optimal 
solution to that returned by the greedy stage. These bounds are 
in-fact applicable to arbitrary submodular or supermoduiar set 
functions. We then specialize those bounds to the set functions 
of interest to us in (IT) and (|~i~4|) to obtain the following result. 










Proposition 2. For any given a, the greedy stage yields an 
output Q such that 

g(G,a)>g(g° pt ,a )/2 Vo €(0,1), 

fl(g,l) > 5 (r Pt ,l)-21n(2), 

(3-2 a )g(g,a)<g(g° pt ,a) V a > 1. 

Remark 1. Note that the last bound in Proposition [2] is 
meaningful in the regime a £ ^1, ''j since then 3—2“ > 0. 

As a result, we can deduce that for all a £ ^0, the 

greedy stage ofGLS itself provides firm (instance independent) 
guarantees. Flowever, as a is increased, the performance of 
the greedy stage degrades compared to the optimal and the 
local search stage of GLS becomes increasingly important. 


We now proceed to examine the performance of the local 
search stage. We leverage the techniques developed in [311 
to analyze the behaviour of a local search based algorithm 
when the latter is used to maximize non-negative submodu- 
lar functions. Here, we extend those techniques to arbitrary 
submodular and non-negative supemodular functions and also 
obtain sharper bounds. We let e = (k. b) denote any tuple in 
0 and expand g as g = {e 1; • • ■ ,e K }. 


Proposition 3. The GLS algorithm for any given A > 0 yields 
an output Q_ such that for any given a > 1 

g{Q_° p \ a) > g{Q, a) + K(1 - A )g(G, a) - h(G, a) 

and for any given a £ (0,1) 

g(G° p \ a) < <?(£, a) + K( 1 + A)<?(£, a) - h(G, a). 


Further, for a = 1 

g(9° p \ i)< 

g(G, 1) + K( 1 + Asgn (g{g, 1 )))g(G, 1) - h(g , 1). 

where, h(g,a) = a) + Ef=i_0(4 a)-fl(0\ 

e n , ct)), for any subset O C H : gf pt U g C O. 

Finally, we note that one obvious choice of the subset Cl 
needed in Proposition [ 3 ] is 0 = 0- However, for a > 1 this 
choice results in loose bounds and a better option is to set V. 
to be the set obtained after removing each tuple e satisfying 
g(e, a) > g(Q _, a) from O. Note that no such tuple can be 
either in §_ or £ opt . Note then that the bounds in Propositions 
[ 3 ] are easily computable once we have the output Q. 

Regarding the complexity of GLS, it is easy to see that the 
complexity of the greedy stage is 0(K 2 B). Moreover, each 
iteration in the local search (LS) stage has O(BK) complexity. 
Further, simulation results presented later reveal that even for a 
large-sized HetNet (KB ss 3000) only very few LS iterations 
(6 or less) are needed to capture the available gains. 


11. Distributed Version 


scale relying on average (not instantaneous) estimates, in prac¬ 
tice a distributed implementation brings its own advantages. 
Remarkably, as we show next, for any given an activation 
vector p, a distributed variant of the GLS algorithm that 
offers identical performance guarantees is indeed possible. 
We make a (justifiable) assumption that each user k £ U 
is supposed to know its weight w k and its (single-user) 
rates R k ,b{p ), V b £ B. Consequently, each user k can be 
configured to compute V b £ B given the fairness 

parameter a. Q^\a), V k,b was defined before for all a ^ 1 
and here for later use we define 0^(1) = Wk , V k,b. We 
will first derive a distributed version of the greedy stage of 
the GLS algorithm. Recall that in this stage a feasible subset 
of tuples Q_ is built up. Then, we note the simple but key fact 
that given any subset of selected tuples g £ X, the change in 
system utility upon adding a tuple ( k , b) g to g, given by 
g(g U (fc, b),a) — g(0_ , a), can be expressed as 

[ 0 i 6 ) (l) ln(©i b) (l)f? fc , b (p)) + vpW(l) In(vl/W(l)) 

\ -(©i b) (l) + * (b) (l)) ln(Bi b) (l) + vpW(l)), o = l, 

[ (&i b) (a) + V (b )(a)) a - ($W(o)) a , Else, 

where we define T'W(a) = J2 {k ,, b , )e g n n^> ©!' } (“)’ V a - 

As a result, each user k (that has not associated to any TP 
yet) can compute the change in system utility if it joins any 
TP b £ B, provided it knows («), which we refer to as 
the current load on TP b. This suggests a natural distributed 
algorithm (outlined in Table [TT] as the distributed greedy stage) 
comprising of two parts, namely, the TP-side and the user- 
side procedures. Considering the TP-side procedure, all TPs 
periodically broadcast their current load at the start of each 
time window on a designated slot, where the window size 
is chosen to accommodate all propagation, acknowledgement 
and processing delays, and where the broadcasting parameters 
(powers, assigned codes etc.) ensure that the loads can be 
reliably decoded by the users. We assume a particularly simple 
procedure where each TP admits only the first user (who has 
requested to associate) in each window. Moving to the user- 
side procedure, each user uses the current loads to determine 
the TP yielding the best system utility change, where the best 
change corresponds to the largest change for a < 1 and to 
the smallest change for a > 1. Note here that in each window 
(defined as the time interval between two consecutive load- 
broadcast slots) multiple associations can be done. Indeed, in 
each window, each TP that receives one or more user requests 
will admit one user, and each un-associated user will send 
one request. Hence, the distributed greedy stage will complete 
all associations in no more than K windows. We offer the 
following important result. 

Proposition 4. The solution obtained after the distributed 
greedy stage yields the same guarantees as in Proposition [2] 


The GLS algorithm presented above assumes a centralized 
implementation. While this assumption is not very restrictive 
due to the fact that the implementation is done at a coarse time 


We now consider the LS stage of the GLS algorithm and 
offer its distributed counterpart. This distributed algorithm 
is initiated once the (build-up) greedy stage terminates after 



Table II: Distributed Greedy Stage 


Table III: Distributed LS Stage 


TP-side procedure: At each TP b £ B 

Repeat 

Broadcast step: 

Transmit current load T ■ , ’- 1 (a) 

Monitoring Step: 

If request from any user k detected 
If another user already admitted 
Send NACK to the requesting user k 
Else 

Admit user k and send an ACK 

Update current load 'P^(a) —r + Q < ' k \a) 

Endlf 

Endlf 

Until No user request and no other TP changes its load 
User-side procedure: At each user k £U 
Repeat 
Listening step: 

Decode all current loads V b £ B 

Request Step: 

Evaluate utility change upon joining each TP b £ B 
Determine TP b corresponding to best change 
Send a request to associate to TP b along with ©g(a) 
Until ACK received from requested TP 


associating each user to a TP. All TPs periodically broadcast 
their current load information at the start of each window 
on a designated slot. The load information of TP b includes 
(a) as before. In addition, when a = 1 it also includes 
the term Y w k ^(wkRk^ip)), where the sum is over all users 
currently associated to TP b. The first key observation behind 
this algorithm is that given all the current load information, 
each user can determine its switch or migration that yields 
the best change in system utility ( [T5| . Moreover, it can also 
assess (via © and ©) if that switch yields a relative 
improvement better than A. Note here that in each window 
in order to ensure a distributed implementation we permit 
multiple users to migrate, albeit to distinct TPs. Prima facie 
it is not apparent that the procedure will converge, since 
each user which migrates in any window only guarantees an 
improvement in system utility if no other user migrates in 
that window. The other key aspect which ensures convergence 
is the introduction of a randomized decision rule at each 
TP. This rule is described next and it is essential to ensure 
convergence to a solution at which no migration that yields a 
relative improvement better than A can be found. In particular, 
under this randomized rule, each TP b that receives a request 
from some user k sets its decision to accept to be negative 
if it has already admitted another user in that window. On 
the other hand, if no user has been admitted by it, that TP 
generates a binary-valued ({0,1}) random variable with a 
specified probability p £ (0,1). It then sets its decision to be 
positive if the generated variable has value one, failing which 
it sets the decision to be negative. 

In the appendix we show that the proposed distributed LS 
stage provably converges and the solution it yields upon con¬ 
vergence yields the same guarantees as in Proposition [3] We 
note here that a distributed user-centric randomized algorithm 
has been recently proposed in ©• However, proving the 


TP-side procedure: At each TP b £ B 

Repeat 

Broadcast step: 

Transmit current load information 
Monitoring Step: 

If request to associate from any user k detected 
Determine decision via randomized rule 
If decision to accept is positive 
Send ACK to user k 
Update current load information 
Else 

Send NACK to user k 

Endlf 

Endlf 

If request to release from any user k detected 
Release user k 

Update current load information 

Endlf 

Until Convergence 

User-side procedure: At each user k £U 
Repeat 

Listening step: 

Decode all current load information 
Request Step: 

Compute utility changes for all migrations 
Determine TP b corresponding to best change. 

Send association request to TP b if relative improvement 
is better than A 
If ACK received from TP b 
Send request to release to current TP 
Send w k ,R k g (p) to TP b 
Endlf 

Until ACK received from requested TP 
convergence of that algorithm for arbitrary a remains an open 
problem. 


IV. AF OPTIMIZATION 


The association scheme described in the previous section 
determines U <h \ the set of users associated to TP b for all 
b £ B. In this section, for a given user association, we present 
a centralized algorithm to determine pg for each b so as 
to optimize the system utility over different a regimes. For 
brevity we suppose that a > 1. The analogous results for all 
other a values as well as an equivalent distributed variant of 
the proposed approach are deferred to the appendix. The AF 
optimization problem in this regime is given by 

min P e[o,ip {Eges (E keuw W(-Rfc,g(p)) 1-1/ “)“}( 18) 

where utk = (yyry) 1 ^ and Rk,b(p) is given by j2j). We 
let f3k = {/3fc,g} Vfe £ B denote the vector containing all 
fading coefficients pertaining to user k on any slot. Then, we 
introduce auxiliary variables Pfc,g(/3fc) for each vector f3k for 
each user k £ U^ b) for each TP b. Using gk,b(/3k ) as a filter 
at user k to detect the signal transmitted from TP b over that 
slot, the mean squared error (MSE), efc,g(/3fc,p), is given by 


efc,t)(/3/c5 p) 


9k,b (/3 1 


+ \gk,b{flk)\ 2 ^ Pk,b'Pb' 

b'^b 


\gk,b{Pk )\ 2 

( 19 ) 




Stage-1 


Stage-2 


Using the mutual information and MSE identity and introduc¬ 


ing more auxiliary variables (cf. 1281), we have 


Rk,b(p) — PblE[ maX 9fc,b(/3fc),Sfc, t (/3fc)>0 

{1 - S kib (f3 k )e k ,b(l3k,p) + log(sfc,&(/3fc))}] (20) 

The solution of each inner maximization problem in ( [20] ) is ob¬ 
tained by setting g k .b(Pk) to be the MMSE filter g k ,b(Pk) with 
s k , b (/3 k ) = s k ,b{flk) = 1 /e k , b {/3 k ,p), where e k ,b(P k,p) = 
e k ,b(Pk,p) \gk,b{Pk) = 9 k,b{Pk)' Usin g <p0|l, the problem in 
© (for the given association) can be re-formulated as 
the following optimization problem over variables p, s = 
{s k ,b{Pk)},g = {g k ,b(Pk)} V/3 kl k G U^ b \ b G B. 


mm < V] I F 

pe[o,i],g>o, s >i 1 ^ \^ b) 

w k 


( 21 ) 


{p b E[1 - Sfe,&(/3fc)efe,h(/3fc, p) + log(s fci 6(/3fc))]) 1_1/o 

Note that for a fixed p, ( [2T] ) can be optimized over s, g via 
the closed form expressions given above. On the other hand, 


for fixed s, g to optimize (21 1 over p, we introduce additional 
variables z = {z b } V6 G B and t = {t k , b }, V k G U^ b \b G B 
and express the reduced problem in ( [21] ) as 

m ittpe[o,i],z>o,t>o % z b r 

l fees J 

subject to 

*> E ™ ktl k,b~ 1 

keu<- b '> 

h,b < /5i,E[l - Sfe,b(/3fc)e fc ,f,(/3fc, p) + log (sk,b{Pk))] Vfc, b 

( 22 ) 


Notice that ( |22[ ) can in turn be re-written as 

m i n pe[o,i],z>o,t>o{^] z b } 

b£B 

subject to 

J2 < 1 b 


(23) 


t k ,bPb T ^ , [Sk,b(Pk')&k : b(Pki P)] 


< 1 Vfc, b 


1 +E[log(sfc i b(/3 fc ))] 

The problem in ( [23] ) is a geometric program (GP) since all 
constraints are inequalities involving posynomials. Thus, we 
can repeat the following two steps until convergence. 


1) 

2 ) 


Fix p and minimize (21 1 over .s, g using closed form 
solution of ( [20| . 

Fix s, g and minimize d2l]i over p by solving equivalent 
GP in 


Note that in the described auxiliary function method we have 
a monotonic improvement in the objective value of (21 1 so 
that convergence is guaranteed. 



Updated AFs 

Figure 1: Joint Association and AF optimization block diagram 


V. Joint Association & AF optimization 


We propose two joint association & AF optimization algo¬ 
rithms for solving the problem in (|5]. These algorithms follow 
an alternating optimization approach where user association 
(stage-1) and AF (stage-2) are optimized in an alternating 
fashion. Fig. [T] shows a block-level decomposition. The first 
algorithm is the Joint GLS-AF algorithm, in which we first run 
the GLS algorithm (Algorithm in Table |T|) and use the obtained 
association in our AF optimization algorithm in Section IV . 
We repeat the following two steps until the benefit in terms 
of the alpha-fairness system utility falls below a threshold. 

1) Stage 1-Fix p and use GLS algorithm to calculate the user 
association. 

2) Stage2-Fix the association and optimize over p using the 
auxiliary function method given in Section IV . 


It is evident that both stages in the above alternating approach 
can be performed using the respective distributed versions 
that we derived before. However, one issue with the proposed 
joint GLS-AF algorithm, is that the TPs that do not serve any 
user in any one iteration will be discarded in all subsequent 
iterations. To overcome this potential limitation, we consider 
the joint relaxed association and AF (Joint RA-AF) algorithm. 
To obtain the association, this latter algorithm in stage-1 
solves the convex optimization problem obtained by relaxing 
variables x k , b , V k, b in 0 or ( fiT)| > to be continuous variables 
in [0,1]. In this solution, a user k can have x ki i, non-zero 
for more than one TP b. In stage-2, the algorithm fixes x k}b 
for all k. b and optimizes the AF. To do so, it uses the 
auxiliary function method of Section IV on the objective in the 
problem (|9| rather than ( p~8j ) as x k j, can now have fractional 
values. This two stage procedure is repeated until the benefit in 
system utility falls below a threshold. Finally, the Joint RA-AF 
algorithm rounds x ktb to obtain a feasible association. 


VI. Evaluation 

We present a detailed evaluation of our proposed: Greedy 
Local Search (GLS) algorithm, the distributed Greedy (DG) 
algorithm and the joint association & AF optimization algo¬ 
rithms over an LTE HetNet deployment. In our evaluation 
topology an enhanced NodeB (eNB) covers the coordination 
area. The eNB site comprises of three cells (sectors), where 
each sector contains a set of eleven TPs formed by one macro 
and ten lower power (pico) nodes. We drop ninety nine users 
on the eNB site so there are a total of B = 33 TPs and 
I\ = 99 users. All TPs and users have a single antenna each. 



















a 

Greedy 

GLS 

RU 

RRA 

MSA 

DG 

LSI 

0.25 

67.75 

67.82 

67.82 

67.82 

65.08 

67.48 

1 

0.5 

112.67 

112.67 

112.71 

112.52 

107.03 

110.39 

0 

0.75 

288.57 

288.57 

288.82 

288.46 

277.65 

283.98 

0 

1.0 

133.93 

-133.87 

-133.3 1 

133.93 

-154.67 

-139.76 

1 


Table IV: Utility versus a 


a 

Greedy 

GLS 

LSI 

a 

Greedy 

GLS 

LSI 

1.25 

563.9 

563.9 

0 

2.75 

975.2 

956.1 

2 

1.5 

411.4 

411.3 

1 

3.0 

1345.8 

1314.2 

2 

1.75 

408.7 

406.8 

2 

3.25 

1904.6 

1853.0 

2 

2.0 

462.6 

458.9 

2 

3.5 

2754.6 

2671.2 

2 

2.25 

565.6 

559.0 

2 

3.75 

4045.1 

3911.4 

2 

2.5 

728.5 

717.2 

2 

4.0 

5953.6 

5740.7 

2 


Table V: Local Search Improvement 


We employ the conservative rates and ignore fast fading in the 
results presented in Section VI-A & Section VI-B. The results 
incorporating actual rates, fast fading and efficient per-slot user 
scheduling are presented later in Section VI-C. 

A. Association 

We compare the GLS & DG algorithms proposed in Section 
III-A and Section III-B, respectively, to the following: 

• Relaxed Upperbound (RU)-Solves the convex optimization 
problem obtained by relaxing Xk,b in (|9j> or (10 1 . Though the 
obtained solution need not be feasible for ||6j, the scheme 
provides us with an upperbound on the optimal of ([6|. 

• Relaxed Rounded Association (RRA)-Solves the convex 
optimization problem obtained by relaxing Xkj, in ([9]) or 
(10 >. Each user k connects to the TP b corresponding to 


highest Xk.b in the obtained convex optimization solution. 
This scheme is widely used to represent the performance 
that can be achieved by a feasible and near-optimal user 
association scheme. However, it requires solving a convex 
problem that can be computationally quite complex com¬ 
pared to GLS in a dense deployment. 

• Max SNR Association (MSA)- Each user independently 
connects to the TP from which it sees the highest average 
channel gain. This scheme is the most common baseline. 
We evaluate the association algorithms by examining their 
returned utility function values for varying a. We also evaluate 
the additional gain yielded by the local search (LS) stage over 
the greedy one in the GLS algorithm. 

1) a < 1: We begin with an evaluation of GLS and the 
distributed greedy (DG) algorithm in the regime a < 1, 
where we consider the maximization of the objective in ([TO]). 
We set p = 1 for each of the 33 TPs and list the utility 
values of different association algorithms in Table IV. As 
suggested by the guarantee in Proposition [2] we observe that 
greedy stage of GLS itself performs very close to the upper 
bound RU, and hence close to the optimal and provides good 
gains over the MSA scheme. Notice that GLS outperforms the 
RRA despite having a much lower computational complexity. 
Moreover, the DG algorithm performs close to the former 
two ones, while simultaneously offering the benefits of a 
distributed implementation. We also observe that the local 
search iterations (LSIs) of GLS are at-most 1 and that there is a 
slight utility gain obtained by the LS stage. Interestingly, upon 
employing the association algorithm from GD we observed 
that the GLS indeed yields the optimal association for this 
example when a = 1. 

2) a > 1: Next we study the performance of GLS & 
DG algorithms in a > 1 region, where we consider the 
minimization of the objective in <0 As seen in Fig. 2(a) 



2(a) Utility vs a 


2(b) Utility vs iterations 


the proposed GLS & DG perform very similarly and they 
noticeably outperform RRA in a > 3 regime while beating 
MSA over the entire range of a > 1. For example, GLS 
performs 13.5 % better than RRA and 80% better than MSA 
at a = 4. MSA performs poorly throughout the a > 1 regime 
since it has a naive user specific view rather than an optimized 
system specific view. The superiority of GLS & DG over 
RRA & MSA increases with increase in a. For example, at 
a high a = 10, which approaches max-min fairness, the GLS 
outperforms RRA & MSA by 93.2% and 100% respectively. 
In Table V we study the advantage of doing local search in 
the a > 1 region. It is known that the greedy algorithm does 
not yield a constant factor approximation for the constrained 
minimization of a non-negative non-decreasing supermodular 
set function. [] Therefore, the greedy stage need not be close 
to the optimal and there is room for improvement by the LS 
stage. As seen in Table V, though the number of LS iterations 
are at-most 2, the order of gain over the greedy is upto 3.6%. 
At a higher a = 10 the gain of GLS over greedy shoots up to 
43%, with the number of LS iterations equal to 5. Therefore, 
as a is progressively increased, the local search stage of the 
GLS algorithm becomes increasingly important. 

B. Joint Association & Activation fraction optimization 

In Fig. 2(b) we study the performance of the two joint 
algorithms described in Section V for a = 3.0 for up-to 4 
iterations. Each point in the plot corresponds to an iteration, 
and is the utility value obtained using the updated association, 
where that association itself is calculated using the updated 
value of the activation fractions. The value at the first iteration 
is the utility corresponding to the association done using AF 
equal to 1 for all TPs. In the Joint RA-AF, at every iteration we 
calculate the utility by rounding the fractional association as 
done in the RRA algorithm. However, as mentioned in Section 
V, fractional values of the association variables {xk,b} are 
passed on to its second stage of AF identification. MSA with 
p = 1 for each TP with a utility value of 3531.8, performs 
much worse than the Joint GLS-AF & Joint RA-AF schemes. 
We obtain a gain of 6.1% for Joint GLS-AF over the case when 

1 This problem is equivalent to the constrained maximization of a submod- 
ular set function albeit where that set function is not non-negative and non¬ 
decreasing, so that the classical result |30| is inapplicable. 






























we do only association via GLS with a fixed p — 1, which 
demonstrates the benefit of doing the joint association and AF 
optimization. The Joint RA-AF scheme performs worse (upto 
8.45%) than the Joint GLS-AF algorithm at every iteration, 
illustrating that the benefits of GLS over RRA observed before 
at p = 1 are preserved even in the joint optimization problem. 
For a = 0.5, Joint GLS-AF performs 23.36% better than MSA 
with p — 1, as compared to the gain of 4.6% obtained by GLS 
over MSA observed in Table IV, again demonstrating the gain 
of optimizing AF and the association jointly. We observe that 
Joint GLS-AF & Joint RRA-AF algorithms perform very close 
to each other in a < 1 regime. This is because of the similar 
performance of GLS and RRA schemes in this a regime. 


and a supermodular set function if and only if 

h(B U a) - h(B) > h(A U a) - h(A), 
\/AcBccikaeci\B. 

A non-negative valued set function h : 2 n — > Ul + is a non¬ 
decreasing set function if and only if it satisfies, 0 < h(A) < 

h(B), V ACBCCl. 

Definition 2. (Cl, I) is said to be a partition matroid when 
there exists a partition Cl = Uj {—\Cli, where ClidClj = (f>, V i 
j, along with integers rii > 1 V i such that 

B CCl : \BnCli\ <mV i<& B G I. (24) 


C. Result Verification with Fast Fading 


Proof of ( 8 ) 


Finally, in this section we incorporate fast fading and 
efficient per-slot user scheduling to asses the benefits of the 
association and activation fractions calculated using proposed 
Joint GLS-AF algorithm. In particular, we assume that each 
frame comprises of 5000 slots and model all fast fading 
coefficients seen by each user on each slot as i.i.d. complex 
normal CAf( 0,1) variables. We randomly generate an ON- 
OFF pattern (for slots across each frame) for each TP that is 
compliant with its assigned activation fraction. Further, each 
TP employs the per-slot gradient based scheduling policy J29) 
over the set of users associated to it in order to maximize 
the utility. Then, using the actual per-user average rates so 
obtained, we compute the system utility values for different 
schemes. For a = 0.5 we observed that the Joint GLS-AF 
scheme yields a 15.35% gain over the baseline scheme (MSA 
with p = 1 ), while the gain of the GLS with p = 1 over 
the baseline is 5.32%. For a = 3 the gains of these two 
schemes over the baseline are 47.8% and 39.4%, respectively. 
This validates that our approach to obtain the association and 
AF does indeed result in significant gains in the presence of 
fast fading and efficient fine time-scale (per-slot) scheduling. 

VII. Conclusion 

We analyzed and evaluated novel association and activation 
fraction optimization algorithms for maximizing the alpha- 
fairness utility in HetNets. We derived useful performance 
guarantees and demonstrated the significant benefits of our 
proposed algorithms over a practical HetNet topology. 

Appendix 

We capture some basic definitions that are used in this paper. 

Definition 1. Given a ground set Cl, we define its power set 
(i.e., the set containing all the subsets of Cl) as 2 n . Then, a 
real-valued function defined on the subsets of Cl, h. : 2 n —> IR 
is normalized if h((j)) = 0, where <j> denotes the empty set. It 
is called a submodular set function if and only if 

h(B U a) - h(B) < h(A U a) - h(A), 

V A C B C Cl & a g Cl \ B 


We will show in brief that for each TP b £E B 


maxv k 

T,k£U 7fc,6 = 1 


y x k}b (wku('yk,bRk,b(p))) 


,keU 




a — 1 



(25) 


The lagrangian for the convex optimization problem stated 
above is given by 


E 

keu 


%k,b^k ( r Yk,bRk,b(p') ) 

a — 1 


keu 


E ^ k(lk,b ) 

keu 


(26) 


Using the first order derivative conditions and complementary 
slackness conditions, it is seen that the objective attains 
maximum value when for each user k : x k ,b = 1 , A k = 0 so 
that 7 k.b > 0 , and the following conditions are satisfied. 


Wkhk,b) a R k ,b(p) 1 “ = p, V k : X k ,b = 1 ; 

E = 1 (27) 

keU : Xk , b =1 

Solving for optimal from ( |27| > and putting its value back 
in the objective, we obtain the RHS of ([ 8 |. 


Proposition 1: 

Hardness of User Association: The hardness of the user asso¬ 
ciation sub-problem for a fixed p can be shown via a reduction 
from the partition problem. To show this, we consider the case 
a > 1 and suppose that there is an optimal polynomial time 
user association algorithm. Further, we restrict ourselves to 
input instances in which the rates that all users can obtain 
from two distinct TPs 61, 62 £ B are identical to one, whereas 
the rate that each user can obtain from any other TP is zero. 
Thus, we assume that R k , b (p ) = 1, V fc £ W & 6 £ {61, 62} 
while Rk,b(p) =0, V k £ U & 6 € $\{61,62}. We allow the 
user weights to be any input set of positive scalars that sum 




to 1. Then, the problem in (|9]) simplifies to 

mill a;fc)b e{0, 1 } V k£U,b£{bl,b2} 

£ be{bl,b 2 } x k,b = 1 V k 

Then, defining z = argmin zg [o,i]{z a + (1 — z) a }. it is 
readily verified that z is unique and equal to 1/2, with 
2 a + (1 - z) a = 2 1 - 01 . Letting W = Ekeu^’ this 
implies that the objective value in ( [28] ) returned by the optimal 
polynomial time user association algorithm will be equal to 
W a 2 1 ~ a if and only if there exists a partition of the set of user 
weights (each raised to power 1 /a) into two parts that have an 
identical sum. This in turn implies that the algorithm at hand 
is an optimal polynomial time algorithm for the NP-complete 
partition problem. Indeed, suppose {yi,--- ,Vk} '■ Vk > 
0, Vfc is any input set to the latter problem where we need to 
determine if there exists a partition of that set into two parts 
of identical sum. Setting Wk = —-, V k = l,--- ,K, 

we obtain a valid input set of weights for d28j). Then, from 
the output of the supposed optimal algorithm at hand, we can 
immediately determine if there is such a partition for the set 
{ Vk ya y/c, IfcLi an d thus the set {yk}k=v which yields the 
desired contradiction. The same reduction can be established 
for a = 1 as well as ct € (0,1). 

To prove the remaining parts of this proposition, we note 
that x a for all non-negative x is concave in x when a £ (0,1) 
and convex in x when a > 1. Then, we note the fact that 
composition of a non-negative modular set function with a 
concave (convex) function yields a submodular (supermodular) 
set function. Further, submodularity as well as supermodularity 
is preserved under set restriction and the sum of submodular 
(supermodular) functions is submodular (supermodular). Us¬ 
ing these facts, we obtain the desired results. Similarly, for 
a = 1 we note that —xln(a;) is concave in x for all non¬ 
negative x. This fact along with the aforementioned arguments 
and the fact that the sum of a submodular set function and a 
modular set function is submodular, establishes the proof in 
this case. Finally, since we allow for arbitrarily small (albeit 
positive) Rk t b{p) for any tuple (k,b) the set function g(.,l) 
need not be non-decreasing nor non-negative. 

Before we consider Proposition 2, we state and prove a 
lemma that will invoked later. The bounds given in this lemma 
are applicable to arbitrary submodular or supermodular set 
functions. 

Lemma 1: 

For any given a , the greedy stage yields an output Q_ such that 



Q {] = <]). Then, note that both Q° pt , Q £ X and are maximal 
members in X, i.e., |g° pt | = \Q\ = K. Further, using the 
definitions given above, we see that X is a partition matroid. 
Invoking a result on maximal members in a matroid (cf. 
jJTj]), we can deduce that without loss of generality, we 
can expand g° pt = {e° pt ,e 2 Pt , • • • , e 1 ^ 1 } such that for each 

i£{ I,-", AT 

Either e° pt = e i , or 

Ue° pt ex. (30) 


Then, letting G_ = G\ G° pt we have the chain of inequalities 
( pT[ ) given on the top of the next page which yields the desired 
result. In ( fTi~[ ) the first inequality follows from submodularity 
of g(.,a) and the fact that for each * : e 4 £ Q (T Q° pt , 
Q_ _ 1 C Q__. U Q_ and ^ Gi-i U Q_. The second inequality 
follows from (30 1 along with the fact that for each i : e° pt f. Q[, 
the greedy algorithm would have considered e° pt but choose 
e, instead since the latter offered a better (greater) change 
in system utility. The third inequality also follows from 
submodularity of g(.,a) and the fact that each i : e° pt ^ Q_ 
we have Q 1 _ y C and the final inequality also follows from 
submodularity of g(.,a). Note that none of the steps require 
g(.,a) to be a non-negative set function or that the changes 
in system utility should be non-negative. The second relation 
in (p9|) can be proved in an analogous fashion. □ 


Proposition 2: 

For any given a, the greedy stage yields an output Q_ such that 

<?(£,«)> <?(£° Pt ,«)/2 Vae ( 0 , 1 ), 

g&a)>g(g° p \a)- 21n(2) V a = 1, (32) 

(3-2 a )g(g,a)<g(g op \a) V a > 1. 


Proof. For a £ (0,1), since g(.,a) is submodular and non¬ 
decreasing, we can readily obtain ( [32] ) from ( [29] ) by observing 
that g(Q° pt U£, a) > g{Q° pt ,a) and g(£, a) > g{Q\Q° pt ,a). 
Note that ( [32] ) is the classical result derived earlier f30j |. For 
a = 1, the result in ( [32] ) is novel and thus more interesting. 
To prove ( [32] ), we first re-write the bound in ( [29] ) as 

9& 1) > 9(G op \ l) + 9(G opt u l) - g(G op \ l) 

- ff (£\r pt ,i)- 


Then, recall from (14) that <?(., 1) is the sum of a modular 
function and a submodular function where the latter depends 
only on the user weights, and the sum of these weights across 
all users is unity. Consequently, we can infer that 


9(S ,«) > g(Q° pt U £, a) - g(G \ Q° p \ a), V a £ (0,11 
g{G^)<g{G op ^G^)-g{G\G op \a), Va>i. 

Proof. We prove the first relation in ( |29| ). For notational 
convenience let us denote a tuple as e = (k. b). We expand Q_ 
as Q_ = {ej, e 2 , • • • ,e K } where e, denotes the tuple added 
at the i th greedy step and let <5, , i = 1, ■ ■ ■ . K denote 
the associated change in system utility. Further, we define 
the sets Q t = {&!,§%,• •• , }, V i = !,•■■ ,K with 


g(Q opt U G, 1) - g(Q_ op \ 1) - g(G\ Q_° p \ 1) 

= - V(n + Vb) ln(x b + y b ) + y^(zb + Vb) ^(z b + y b ) 
b b (34) 

+ ^ ( X b ~ Z b ) l n (^& - Z b ) 

b 

where Xb is the sum of weights of users associated to TP b by 
the greedy solution (and hence is known), yb + Zb is the sum 
of weights of users associated to TP b by the optimal solution 






(31) 


K 

9(G,a) = Y S i = Y (9(§_ i _ 1 Ue i ,a)-g(g i _ 1 ,a))+ Y (s(£i-i u h, <*) - giG^, a)) 
^ =1 i-.e.egng^ v.e^Q 

> Y (5(£ i _iU£Ue i ,a)-3(£ i _ 1 U£,a))+ ^ (g(£ i _ 1 Ue i ,a)-g(£ i _ 1 ,a)) 

i:e i G^n0 opt i : §_i€G_ 

= g(G,a) - g{G,a) + Y (s(£ i -i u ^ a ) - ^«- 1 > a )) 

>g(G,a)-g(Q,a) + Y (ff(£i-i y e° pt ,a) - g{Q_ i _ 1 ,a)) 

*:e° pt £0 

> g(G,a) -g(G,a) + Y (g(GUe> p \a) - g(G,a)) 

i:e° pt <jtg 

> g(§_, a) - g(G, a) + g(G opt u G, a) - g(G, a), 


and Zb is the sum of weights of users associated to TP b by 
both the greedy and the optimal solutions. Note further that 
J2b x b = J2b(Vb + z b) = 1- Combining (34 1 with (33i we can 
obtain the following specialized bound, 

9(G,1) >g(G_ opt , 1 ) 

+ min „{-£(*>+ »)K* + ») 

yb> z b>°’ z b< x b Vb z — 

T,b(yb+ Z b) ==1 b 

+ ^ ~2( Z b + Vb) ln (^6 + Vb) { ' K? } 

b 

+ ^2( x b ~ Zb) 1 n(x 6 - z b )}. 
b 

Then, by using the K.K.T. conditions for the optimization 
problem in the RHS of ( f35] >, it can be shown that the minima 
is attained at yb = Xb & z b = 0 V b so that 

min WI , {- Y( Xb + M^b + Vh) 

yb< z b>°’ z bS^b vi > * ' 

Eb(^i,+2b)=i b 


solutions. Clearly, then we can further bound 

g{G,a) < g(G op \a)+ max {^(K + tb Y 

vu .uu >u:uu <-tu Vo ‘ -* 


v b ,u b >0-,u b <t b Vb 
Si(«b+»i,)“<s(5, 0 ) b 

- (Vb + u b ) a - (tb - u b ) a )} 


(37) 


Again invoking the K.K.T. conditions for the optimization 
problem in the RHS of ( |37| , it can be shown that the maxima 
is attained at v b = t b & Ub = 0 V b so that 

max {Y (( v b + hT - Ob + u b ) a - (t b - w 6 )“)} 

EiK+« t )“<9(5.-) b (38) 

= (2“-2 )g(G,a) 

This then proves the result in (|32]>. □ 


Proposition 3: 

The GLS algorithm for any given A > 0 yields an output Q_ 
such that for any given a > 1 


_ _ g(G° p \a)>g(G,a) + K(l-A)g(G,a)-h(G,a) 

+ } fz b + Vb) Inpb + Vb)+y Xx b - Zb) ln(at6 - z b )} = -2 ln(2j. 

b b and for any given a £ (0,1) 


This proves the result in ( |32] i. Next, we consider a > 1 and 
specialize the bound in ([29fas 

9 (G, a) < g(G op \a)+g(G opt ^G,a)-g(G op \a)-g(G\G op \ 

= g(G ° pt > a ) + Y^ Vb + tb ^ a ~ (' Vb + Ub ) a 

b 

-Ob -«&)“), (36) 

where now t b is the sum of gains of all users associated to TP b 
by the greedy solution (i.e., sum of 0^(a) in for all tuples 
in G_ Cl fand hence is known) so that g(G_,a) = ^2 b t b . 
v b + u b is the sum of gains of all users associated to TP b 
by the optimal solution and u b is the sum of gains of all 
users associated to TP b by both the greedy and the optimal 


g(G opt , a) < g(G , a) + K( 1 + A )g(Q, a) - h(Q_, a). 

Further, for a = 1 

g(G° p \ 1)< 

g(G, 1) + K( 1 + Asgn( 5 (£ 1 )))g(G, 1) - h(G, 1), (39) 

where, h(G, a) = En=t9(&\£n, a )+En=i(9& a)-g([l\ 
e n , a)), for any subset fi C fl : Q° pt U^Cfl. 

Proof. We prove the result for a > 1 and the result for 
a in other regimes can be derived similarly. We again in¬ 
voke a result on maximal members in a matroid ED, to 
deduce that without loss of generality, we can expand G_ = 
{§ 1 , e 2 , • • ■ ,e K } and expand £ opt = {e° p \ e 2 pt , • • ■ ,e^ 1 } 
such that for some to £ {0,1, ■ • • , K }, 


e^ pt = e„, V n < m & e° pt f^e n , V n > m 

( G_ \ e n ) U e° pt £ X, V n : m + 1 < n < I<. (40) 




Then, we have the following inequalities for each n = m 

K. 


5 (£Ue° pt ,a) -g(G,a) > g((G\e n )Ue° pt ,a)-g(G\e a) </>. 


Table VI: Restricted Greedy Algorithm 

a rg 

1: Initialize with any ordering 7r(.) defined on U and Q = 


> (! - &)9{G, a) - g(G \ e n , a) 

where the first inequality follows from the supermodularity 
of g{-,a) and the second one follows from the local swap 
optimality of Q, i.e., 

g((G \ In) U e° pt , a) - g{G, a) > -A g(Q, a). (42) 

Thus, we have that 

K 

E (g(GUel P \a)-g(G,a)) 


n = m -\-1 
K 

> E ((! - A )5(£,a) -5(g\e n ,a)) 

n = m -\-1 

and due to the supermodularity of 

K 

E 9(G Ue° pt , a)-g(G, a) 


n=m -\-1 
K 


< E ( 5 (^ u {' 


opt . . . opt 
-ra+1’ > —n 


n=m -\-1 


~g(G UfC+r’" 

= g(eue opt ,Q)- 4 “)' 


Next, we have the bound 

g(G_UG op \a) 


K 


K 


<g{G° v \a)+ E -fl , (0\e„> Q! ))! 


2 : For k = 1 to K, 

3: Determine ( Tr(k),b') as the tuple in II which offers the 
best change among all tuples ( 7 r(k),b) € G. 

4: Update G_ S = £ S U (7t(A:), b'). 


End For. 

Output G_ 


rg 


(43) 


which we recall does not hold for our set functions when 
a > 1, a somewhat lesser known result is that a restricted 
version of the greedy algorithm can also yield identical con¬ 
stant factor approximation |32] , We next establish a similar 
result with respect to the bounds in Lemma 1 and Proposition 
[2] In particular, we first detail the restricted greedy algorithm 
in Table VI Next, we show that for any given ordering 7r(.), 


}, a) (44) 


the restricted greedy algorithm yields a solution that also 
satisfies the bounds in Lemma 1 for all a. Thus, the solution 
of the restricted greedy algorithm also satisfies the bounds 
in Proposition [2] for all a and hence yields the same firm 
guarantees for all a £ ^0, ■ Towards this end, we expand 

the solution yielded by the restricted greedy algorithm as 


G S = {e^ s,7r , e^’^, ■ ■ ■ ,e^ ,7r } where ej s,7r denotes the tuple 
added at the i th step as per the ordering 7r(.). Then, notice 
that all the arguments in the jjiroof of Lemma 1 go through 
even upon replacing G_ with G_ ° and e, with f' e,7r , V i. The 
key point to note here is that we do not require the changes 
in system utility obtained across the steps to be ordered. In 
other words, we do not use the fact that these changes obtained 
during the greedy stage of the GLS algorithm are ordered as 
Si > S 2 > ■ ■ ■ > 5k when a < 1 or as <5i > 62 > • • • > 5k 
when a > 1, whereas no such ordering is ensured for those 
obtained during the restricted greedy algorithm. 

Notice that the the aforementioned result applies to any 
ordering 7r(.). We will exploit this fact along with a result 
that the solution yielded by the distributed greedy algorithm 
maps exactly to that yielded by the restricted greedy algorithm 
for a particular ordering. We will suppose that a < 1 since the 
arguments we make directly extend to the case where a > 1. 
Let 4 s ,-•• ,4: be the tuples selected by the distributed 
greedy algorithm, where we assume that tuples ef s , ■ ■ ■ ,4?i 
are selected in the first window, tuples 4f 1+1 ,--- , e^f 2 are 
selected in the second window and so on. Moreover, let 
u±,U 2 ,--- ■ u rn \ denote the corresponding users in the tu¬ 
ples selected in the first window, let u m i+i, rt m i+ 2 , • • • , u m 2 
denote the corresponding users in the tuples selected in the 
second window and so on. We define an ordering 7r(.) such 
that 7T (k) = Uk, k = 1, ■ ■ • , K. Note here that we can pick 
any arbitrary order to list the users (tuples) selected by the 
distributed greedy algorithm within each window. We will 
show that 


= g(G° pt , oi) + E (g(G opt U{e m+1 ,-.. ,e n },a) 

n=m+t - 45) 

- g(G opt U{e m+1 ,--- ,e n _ 1 },a) 


n=m-\- 1 

for any subset & C Q : 0 opt U Q_ C Combining the bounds 
in (grj, (( 44 J) and © we get 

K 

g(G° pt ,u) > g(G,u)+ E (( 1_A )s , (^ a )-S , (£\e„,Q!)) 

n=m-\- 1 
K 

- E (5(0.a)-fl(0\e n ,a)). (46) 

n=m+1 

The RHS of ( |46| ) is further lower bounded to obtain the desired 
result in ( [39l >, by extending the summation from 1 to K , where 
we note that each term ((1 — A)g(Q_,a) — g(G_ \ e„,o ;)) — 
(j(S], a) — g{i 2\e„,a)) < 0 since A > 0 and g(.,a) is 
supermodular and non-negative. □ 

Proposition 4: 




= g Ps , V k = 1, ■■■ , K 


(47) 


For non-negative non-decreasing submodular set functions, which proves the desired result. Consider the tuples selected 




in the first window. Each user u, i = 1, • • • , ml chooses the 
TP yielding the best change in system utility assuming zero 
current load on all TPs. Thus, it is readily seen that ej S: " = 
e^ s . Consider the TP choice of user Ui, i = 2, • • • , ml made 
as e^ s = argmax( u . b ) heB {( 7 ((uj, 6), a)}. By sub-modularity 
of g(.,a) for a < 1 and the fact that the TPs chosen by the 
admitted users in each window are all distinct, we have that 

ef s = arg max \g({el s U • • • U U {u u b),a)~ 

(Ui,b),b£B 

5({e? s U---Ueff 1 },a)|(48) 

Put differently, given that tuples {e^ s U • • • U erfjJ have been 
already chosen, the best TP for user U{ will still be the one in 
e^ s . This is because upon selecting the tuples {e^ s U- • -Ueff 2 } 
the loads of the TPs in these tuples will increase, whereas that 
of the one in e' ls will remain unchanged. Thus, the system 
utility change obtained if user u, joined each one of those TPs 
(given these selections) will be inferior, respectively, to what 
that user assumed when making its decision (since it used a 
lower value of the load). On the other hand, the system utility 
change obtained if user it,; joined the TP in e' Js (given that 
tuples {ej lg U • • • U 1 } have been already selected) will be 
identical to what it assumed. Then, from ( |48] l we have that 
ej g,7r = ef s , V i = 1, • • • , ml. The same argument applies to 
each subsequent window upon observing that all users that are 
selected in that window use load values that account for all 
associations made in all prior windows. Thus, we can conclude 
that ( |47] > is true which proves our claim for the distributed 
greedy algorithm. 

In this context, we note that another distributed greedy 
algorithm can be obtained by altering the TP-side procedure 
to one where in each window each TP admits only the user 
offering the best change among all users that have requested 
it in that window. From the proof detailed above, it can be 
verified that this variant also yields identical performance 
guarantees. 

Distributed LS Stage: 

We will show that this distributed LS stage provably con¬ 
verges and the solution it yields upon convergence yields the 
same guarantees in Proposition [7| 

To prove this claim, we define a system state to be a 
feasible user association, i.e., an association where each user 
is associated to one TP. Thus, the set of all possible system 
states is finite and comprises of all feasible user associations. 
Let us define a system state to be an absorbing state if at 
that state, for each user the switch yielding the best change 
in system utility ( p~5| ) does not yield a relative improvement 
better than A (cf. ( |16| > and (j \1\). Clearly, the optimal system 
state (which yields the globally optimal system utility) is an 
absorbing state so that the set of absorbing states is finite and 
non-empty. Further, given any non-absorbing state it can be 
verified that we can construct a finite sequence of states that 
begins at the given state and ends at an absorbing one, such 
that each transition from any state to the next one in that 


sequence involves a migration of exactly one user and yields 
a relative improvement (in the system utility) better than A. 


Next, considering the distributed LS algorithm, it is readily 
seen that the broadcast of the current load information at 
the start of each window corresponds to a system state. 
Moreover, without loss of generality, we can assume that 
each user which sends a request in any window is accepted 
with a strictly positive probability that depends only on the 
system state at the begining of that window and the user 
index. Consequently, the sequence of states seen across the 
broadcast slots forms an absorbing, time homogeneous Markov 
Chain. Hence, convergence to an absorbing state is guaranteed. 
Indeed, the expected number of steps for convergence can be 
obtained from the analysis in (33) . Finally, since the bound in 
Proposition [3] is satisfied by any absorbing state, we can assert 
the claimed guarantee for the distributed LS algorithm is true. 


AF Optimization 


We first discuss a distributed implementation that ensures 
no loss in performance. Towards this end, it is readily seen 
that for any fixed activation vector p the optimization over 
s, g decouples into smaller problems which can be separately 
solved at each TP. We notice, however, that the AF variables 


in the GP formulation in (23 i induce coupling constraints. 


Nevertheless, this issue can be addressed by exploiting a useful 
decomposition technique from )34) and introducing local 
copies for the AF variables. In particular, for each AF variable 
Pb, we introduce B — 1 local copies pb',b, V b' £ B : b' ^ b 
( pb> b is the copy of pb maintained at TP b') and re-write the 
GP in ( |23| including these local copies along with equality 
constraints pb = Pb',b, V b' £ B : b' ^ 6,V b G B, as the 
following. 




beB 


subject to 

_1 ^ 1 v b e B 


keu^ 

-i 


tk.bpb ~b lE[sfc,b(/3fc)efc,fc(/3fc; Pb, {pb,b '})] < ^ y ^ g ^(b) b G B 
1+E[log(sfc )6 (/3 fc ))] 

Pb' = Pb.b', Mb' ylbkbeB. 

(49) 


The problem in (501 can be decomposed into smaller sub¬ 


problems by using a Lagrange multiplier for each equality 
constraint (a.k.a. consistency price variable). However, to 
ensure that the sub-problems are also convex, we first adopt the 
(usual) change of variables Zb = ln(zf,), tk,b = ln(ffc,b)> V k £ 
Pb = ln(pb) and p b ,b' = 1 n (Pb,b')i V b 1 ^ b, for all b e B. 
Then, we note that the equality constraints can be written as 
Pb' = Pb,b' forall b' ^ b & b £ B. This transformed problem 





is presented below 

beB 

subject to 

ln( ^2 w k exp(-z b +(1/a-l)t k>b )) <0 V b £ B 

k£U( b '> 

, / exp (t k ' b - p b ) + E[s kjb (f3 k )e kyb (Pk, Pb , {Pb,b'})} \ „ v 

n i+E[io g ( Sfc ,„(&))] ) ~ ’ 

Pb' = Pb,b', Vb'^b&beB. 

(50) 

where we use e k , b (-, •) to denote the MSE as function of the 
transformed variables. Note that a convex optimization 
problem with its utility function (decoupled across TPs) and 
where the constraints are either also decoupled or are coupled 
linear equality ones. Thus, a decomposition technique intro¬ 
duced in [34J is now directly applicable and accordingly we 
introduce a Lagrange multiplier for each equality constraint 
constraint. Each TP b can then separately solve a convex sub¬ 
problem and the multipliers can be updated using the sub¬ 


gradient method in a distributed manner 1341. 


minimize 


As done in case of a > 1, we reduce 
obtain 


min P e[o,i],t>o i£ £ tOfe ln(f fc> j,) 1 

b£B fcgwW 

subject to 

tk,bP b T ^L(Sk,b(fik')6k,b(ftk'> P)) 


< 1 V6, k 


Note that (541 is a convex optimization problem. Again, we 
use alternating optimization approach to obtain the solution of 
We use solution of ( [20] ) to minimize over s, g when p is 
fixed and further use a to minimize over p when s,g are 
fixed. 


k B b a<1 

AF optimization problem over the set of variables p = 
{p b } Vi) £ B in a € (0,1) regime is given by 


max pe[0 ,i] {Ebes (EkeuW Wk(R k ,b(P)) 1/a (55) 

Where w k = We choose C = 

Eb G B(EfeewW« } fc( E ( lo g( 1 + / 3 fe,b))) 1/a_1 )“- Now we 

use the reduction for a as done in (|20]>-(|22] and further 
fix s and g. We obtain the following optimization problem in 
variables p, z,t 


min 


pG[0,l],z>0 


,t>o c— a 


beB 


A. a = 1 

AF optimization problem over the set of variables p = 
{p b } V6 £ B in a = 1 regime is given by 

maximize pe[0 ,i] {E&es E keum w k In (R k , b (p))} ( 51 ) 

The problem of interest is equivalent to 

p6[o,i] {E(,gb Efeewt 6 ) Wk (-^) 

and fix s, g to 


subject to 

< £ Wb 


Zb 


(56) 


keuw 

-i 


h,bP b +E(sfc,h(/3fc)e fejb (/3 fc ,p)) 


< 1 V6, k 


1 +E(log(sfc,b(/3fe))) 

Adding an extra variable y, the above problem (|56|) is equiv¬ 


alent to 


minimize a >o, P £[o,i],z>o,t>o {y} 
subject to 
C 


< 1 


< 1V6 


(53) 


V + E 6GB z b 

_ Zb 

sth ~ ,1/a— 1 

LfcGWW W k z k,b 
tkfiPb 1 + E(s fc;b (/3 fc )e fc;b (/3 fc , p)) 
1 +E(log(s feib (/3 fc ))) 


(57) 


< 1 V6, k 


1 + E(log(s fcib (/3fc))) 

We consider change of variables t k ,b = exp (t k ,b) V6 £ B,k £ 
U(b) and Pb = exp (Pb) V& € B. Let a kjb = 1+E(log * fc A/3k)) ■ 

Now [53] > can be further reduced to 

mm p<o,i \ £ £ ~Wkh,\ 

[fees keuw 

subject to 

log(a fc , b exp (~p b + t k ,b) 

+ a k ,b^(s k ,b((3 k )( g k ,b(Pk)yJPk,b ~ 1 + 1 9k,b(fik)\ 2 )) 

+ a ex P(Pb') a fc,bE(s fcjb (/3 fe ) \g k ,b(Pk)\ 2 Pk,b')) < 0 

b'^b 

(54) 


To transform this optimization problem (57 i into a GP, we 
need to apply the single condensation method [35) on the 
first two constraints of ( [57] ), which are of the form of ratio 
of a monomial and a posynomial. Let X = (y , z) and 
t = {tk,b} V& £ B,\/k £ U( b \ For any current X, t we 
define 


b 


f(X) = (^A)fi^ H(^F Z ) /(i ) 


y , « b 

Where f(X) = y + J2beB z b- We also define 


~ h b(t) = n ( 


t k , b lh b(t) jM- .£ 


HP) 


(58) 


(59) 


rl/a-l 

keUW L k,b 

Where h b (t) = J2keu( *>) ^kt]!// 1 - Then the following ap 




















proximate problem is a GP 

minimize x >o, pe[0 , 1]it > 0 {y} 
subject to 

^-<1 

f(X) ~ 

J^< 1 V6 

h b {t) 

tk,bP b “t“ (/3fc )efc jb 1 p) ) 

1 +E(log(s fe! b(/3 fc ))) 
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